Evaluating Information Retrieval Metrics Based on Bootstrap Hypothesis Tests

نویسنده

  • Tetsuya Sakai
چکیده

This paper describes how the bootstrap approach to statistics can be applied to the evaluation of IR effectiveness metrics. More specifically, we describe straightforward methods for comparing the discriminative power of IR metrics based on Bootstrap Hypothesis Tests. Unlike the somewhat ad hoc Swap Method proposed by Voorhees and Buckley, our Bootstrap Sensitivity Methods estimate the overall performance difference required to achieve a given confidence level directly from Bootstrap Hypothesis Test results. We demonstrate the usefulness of our methods using four different data sets (i.e., test collections and submitted runs) from the NTCIR CLIR track series for comparing seven IR metrics, including those that can handle graded relevance and those based on the Geometric Mean. We also show that the Bootstrap Sensitivity results are generally consistent with those based on the more ad hoc methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Review of ranked-based and unranked-based metrics for determining the effectiveness of search engines

Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...

متن کامل

Metrics, Statistics, Tests

This lecture is intended to serve as an introduction to Information Retrieval (IR) effectiveness metrics and their usage in IR experiments using test collections. Evaluation metrics are important because they are inexpensive tools for monitoring technological advances. This lecture covers a wide variety of IR metrics (except for those designed for XML retrieval, as there is a separature lecture...

متن کامل

Statistical inference in retrieval effectiveness evaluation

Evaluation methodology, and particularly its statistical tests associated, plays a central role in the information retrieval domain which maintains a strong empirical tradition. In an effort to evaluate the retrieval effectiveness of a search algorithm, this paper focuses on the average precision over a set of fixed recall values. After reviewing traditional evaluation methodology through the u...

متن کامل

Topics in kernel hypothesis testing

This thesis investigates some unaddressed problems in kernel nonparametrichypothesis testing. The contributions are grouped around three main themes:Wild Bootstrap for Degenerate Kernel Tests. A wild bootstrap method for non-parametric hypothesis tests based on kernel distribution embeddings is pro-posed. This bootstrap method is used to construct provably consistent teststh...

متن کامل

A Wild Bootstrap for Degenerate Kernel Tests

A wild bootstrap method for nonparametric hypothesis tests based on kernel distribution embeddings is proposed. This bootstrap method is used to construct provably consistent tests that apply to random processes, for which the naive permutation-based bootstrap fails. It applies to a large group of kernel tests based on V-statistics, which are degenerate under the null hypothesis, and nondegener...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007